The likelihood is the probability of the observed outcome (i.e. the data) given a particular choice of parameters. For a particular statistical model, maximum likelihood finds the set of parameters that makes the observed data most likely to have occurred. That is, we find the set of parameters that makes the likelihood as large as possible.
The diagram above shows what is happening when you are calculating the maximum likelihood. Here, we have a line of best fit, with the estimated values of the response variable \(\hat y_{1...n}\) (red dots). The actual values of the response variable (our data), are reperesented by the black dots. The residuals are indicated by the \(\epsilon\). These residual is the distance between the actual value of the response variable and the estimated value of the response variable.
When we are calculating the maximum likelihood, we are looking for the parameters that maximise the likelihood of the data. The horizontal arrows trace up to the normal distribution which represents the fit. The closer to the peak of the distribution the data falls the “more likely” the data is given the parameters.
For mathematical convience, we often work with the logarithm of the likelihood (the log-likelihood) instead of the likelihood. However, the parameters that give the maximum log-likelihood also give the maximum likelihood.